arXiv:1503.05900vl [math.ST] 19 Mar 2015 


QUANTIFYING NUISANCE PARAMETER EFFECTS VIA DECOMPOSITIONS OF 
ASYMPTOTIC REFINEMENTS FOR UIKELIHOOD-BASED STATISTICS 

THOMAS J. DICICCIO, TODD A. KUFFNER, AND G. ALASTAIR YOUNG 


Abstract. Accurate inference on a scalar interest parameter in the presence of a nuisance parameter may 
be obtained using an adjusted version of the signed root likelihood ratio statistic, in particular Barndorff- 
Nielsen’s R* statistic. The adjustment made by this statistic may be decomposed into a sum of two terms, 
interpreted as correcting respectively for the possible effect of nuisance parameters and the deviation from 
standard normality of the signed root likelihood ratio statistic itself. We show that the adjustment terms 
are determined to second-order in the sample size by their means. Explicit expressions are obtained for the 
leading terms in asymptotic expansions of these means. These are easily calculated, allowing a simple way 
of quantifying and interpreting the respective effects of the two adjustments, in particular of the effect of 
a high dimensional nuisance parameter. Illustrations are given for a number of examples, which provide 
theoretical insight to the effect of nuisance parameters on parametric inference. The analysis provides a 
decomposition of the mean of the signed root statistic involving two terms: the first has the property of 
taking the same value whether there are no nuisance parameters or whether there is an orthogonal nuisance 
parameter, while the second is zero when there are no nuisance parameters. Similar decompositions are 
discussed for the Bartlett correction factor of the likelihood ratio statistic, and for other asymptotically 
standard normal pivots. 


1. Introduction 

We are concerned with inference on a scalar interest parameter in the presence of a, possibly high 
dimensional, nuisance parameter, based on a data sample of size n, and with identification of procedures 
which yield repeated sampling accuracy. In this setting, inference accurate to third order, that is with 
repeated sampling error of order 0(n~ 3j/2 ), may be obtained using an adjusted version of the signed root 
likelihood ratio statistic, in particular through use of Barndorff-Nielsen’s R* statistic (Bamdorff-Nielsen, 
1986 ). 

The R* statistic is particularly useful in two contexts. In full, multi-parameter exponential family 
models inference based on standard normal approximation to the sampling distribution of the R* statistic 
approximates to third order the optimal, conditional, but generally intractable, inference, which is based 
on conditioning on the sufficient statistic for the nuisance parameter. In more general models which ad¬ 
mit an ancillary statistic, taken to mean an approximately distribution free statistic which together with 
the maximum likelihood estimator constitutes a minimal sufficient statistic for the full parameter in the 
model, the normal approximation approximates to the same third order an exact inference based on con¬ 
ditioning on the ancillary statistic. A practical limitation of the use of R* is in the requirement of explicit 
specification of the appropriate ancillary, and the need to express the likelihood directly in terms of the 
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maximum likelihood estimator and the ancillary statistic. When calculation of the R* statistic is tractable, 
inference with repeated sampling accuracy 0(n -3 / 2 ) is obtained through the normal approximation. This 
same level of repeated sampling accuracy may be obtained by parametric bootstrap procedures, in par¬ 
ticular those based on simulation estimation of the sampling distribution of the unadjusted signed root 
statistic: see DiCiccio et al. (2001), Lee & Young (2005). Key to this bootstrap approach is appropriate 
handling of the nuisance parameter: third order repeated sampling accuracy is obtained by considering 
the sampling distribution of the signed root statistic when the nuisance parameter is specified as the 
constrained maximum likelihood value calculated from the observed data sample. 

Inference based on the R* statistic and the parametric bootstrap alternative sketched above are analyt¬ 
ically related. DiCiccio & Young (2008) observe that in the problem of inference on a scalar component 
of the canonical parameter in the multi-parameter exponential family context, inference based on normal 
approximation to R* may be viewed as an analytic, saddlepoint approximation to the bootstrap infer¬ 
ence. In the same way, it is readily seen that in the ancillary statistic context, inference based on R* 
may be regarded as a saddlepoint approximation to a conditional bootstrap calculation, which simulates 
the distribution of the signed root statistic conditional on the observed value of the ancillary statistic, 
with the nuisance parameter fixed at its constrained maximum likelihood value. Simulation of this con¬ 
ditional bootstrap distribution will be infeasible in many circumstances, though in certain cases, such 
as regression-scale models, simple methods of conditional simulation, employing MCMC, are possible: 
see Brazzale & Davison (2008). Alternatively, and more simply, the conditional distribution may be 
replaced by simulation of the marginal distribution of the signed root statistic. DiCiccio et al. (2015) 
demonstrate that the marginal bootstrap distribution approximates the conditional bootstrap distribution 
to second order, 0(nr '), given the ancillary statistic. 

The adjustment made by the R* statistic may be decomposed into a sum of two terms, interpreted as 
correcting respectively for the possible effect of nuisance parameters and an information adjustment, rep¬ 
resenting the deviation from standard normality of the signed root likelihood ratio statistic itself. Pierce 
& Peters (1992) proposed such a decomposition in the case where the interest parameter is a component 
of the canonical parameter in a full exponential family model. A generalization of the decomposition 
is detailed by Bamdorff-Nielsen & Cox (1994, Section 6.6.4). Starting from numerical investigations 
by Pierce & Peters (1992), it has been noted that the information adjustment is typically small when 
the adjusted information for the interest parameter, which we define formally in Section 2, is large. By 
contrast, the nuisance parameter adjustment can be appreciable when information on the nuisance pa¬ 
rameter is small, as will usually occur when its dimension is large. Crucially, however, the magnitude of 
the nuisance parameter adjustment relative to the information adjustment also depends on the structure 
of the statistical model in question, and a simple methodology for measurement of nuisance parameter 
effects for a given model is lacking. 

In this paper we note that the adjustment terms are, from a repeated sampling perspective, determined 
to second-order, 0{n ~ 1 ), in the sample size by their means. The precise definitions of the adjustment 
terms themselves are unimportant to our strategy for quantifying nuisance parameter effects, though we 
note that, except for full exponential family and transformation models, they must generally be approxi¬ 
mated, leading to only second-order accuracy from the resulting adjusted signed root statistic. Approxi¬ 
mations to R* which yield second-order accuracy include those described by DiCiccio & Martin (1993) 
and Skovgaard (1996): for a summary see Severini (2000, Section 7.5). 
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We obtain explicit expressions for the leading terms in asymptotic expansions of the repeated sam¬ 
pling means of the nuisance parameter and information adjustments. These involve calculation only of 
expectations of certain low-order log-likelihood derivatives, and are therefore easily evaluated for quite 
general models, even when the R* statistic itself is intractable. The adjustment terms have variances 
of low order 0(n ~ 2 ) and the asymptotic means therefore allow a simple, effective and general way of 
quantifying and interpreting the respective effects of the two adjustments. Of particular methodological 
interest is analysis of the effect of a high dimensional nuisance parameter on the inference based on the 
R* statistic, and by extension its bootstrap alternative. Inference based on the R* statistic, when tractable, 
represents a ‘gold standard’ in what is achievable in the inference problem and we have noted a close re¬ 
lationship between inference based on the R* statistic and parametric bootstrap inference. It is reasonable 
therefore to expect that the calculations are useful too in shedding light on operation of the parametric 
bootstrap. The repeated sampling properties of the bootstrap are, modulo Monte Carlo error introduced 
by the need in practice to construct the bootstrap estimate of the sampling distribution of the signed root 
statistic from a finite simulation, determined entirely by nuisance parameter effects, through substitution 
of unknown values by estimates. A central recommendation of this paper is that valuable insights to 
operation of the parametric bootstrap may be obtained by identification of the explicit way in which the 
means of the nuisance parameter and information adjustments depend on the nuisance parameter. As 
we shall see in Section 4, in certain key problems these quantities depend only on the dimension of the 
nuisance parameter, and not on its actual value. In such cases we may reasonably expect good repeated 
sampling accuracy from the bootstrap, as precise specification of the nuisance parameter values in the 
calculation is unimportant. In other situations, we observe that the value of the nuisance parameter has 
a more substantial effect on the adjustment means, in which case we may be alert to impaired accuracy 
from the bootstrap and its analytic alternatives, especially with small sample sizes. 

Our analysis provides a decomposition of the mean of the signed root statistic involving two terms: the 
first has the property of taking the same value whether there are no nuisance parameters or whether there 
is an orthogonal nuisance parameter, while the second is zero when there are no nuisance parameters. 
Similar decompositions are discussed for the Bartlett correction factor of the likelihood ratio statistic, 
and for other asymptotically standard normal pivots, in Sections 5 and 6 respectively. 


2. The inferential problem 

Suppose that Y = (Y \,..., Y n ) is a continuous random vector and that the distribution of Y depends 
on an unknown d-dimensional parameter 9 = (9 1 ,... ,9 d ), partitioned as 6 = (ip, cp), where ip = 9 l 
is a scalar interest parameter and cp is a nuisance parameter of dimension d — 1. Let L(9) be the log- 
likelihood function for 6 based on Y and let 9 = (ip, cp) be the global maximum likelihood estimator of 
9. Further, let 9 = 9(ip) = (ip, cp) = {ip, <p(ip)} be the constrained maximum likelihood estimator of 9 
for given ip. Then the profile log-likelihood function for ip is M(ip ) = L{9(ip)} and the likelihood ratio 
statistic for ip is W(ip ) = 2 {M(ip) — M(ip )}, where M(ip ) = L(9 ), since 9(ip) = 9. The signed root 
likelihood ratio statistic is R(ip) = sgn (ip — ip){W(ip)} 1 ! 2 . Then, for example, testing H 0 : ip = ip 0 
against H a : ip > ip 0 or H a : ip < ip 0 can be based on the test statistic R(ipo). Asymptotically, as 
the sample size n increases, the sampling distribution of R(ip) tends to the standard normal distribution. 
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Specifically, R{ip) is distributed as standard normal to first order, to error of order 0(n _1//2 ). By contrast, 
the R* statistic is distributed as standard normal to error of order 0(n -3 / 2 ). 

The R* statistic is defined by 

(1) iT(V0 = Rty) + R^Y 1 log (f(0)0R(0)), 

where u(0) is given (Barndorff-Nielsen, 1986) by 


( 2 ) 


®W>) = 


L-,iW ~ L d s 1 


/{fcwr / 2 bwr /2 }. 


Here, it is supposed that the log-likelihood function has been written as L(9\ 9, a), with (9, a) minimal 
sufficient and a ancillary, that is with a distribution which, at least approximately, does not depend on 9. 
Further, 

n r\ 2 

L. d m = L.,{e I e, a) = -Me-e, a), L 0 (e) = L tiS (e- 6, a) = —Me- 6, a). 

o9 ocpoO 

Also, j denotes the observed information matrix, j{9) = (— L rs {9)), with L rs {9 ) = <9 2 L{9) / 30 r 89 s , and 
denotes its (0, 0) component. The sampling distribution of f?*(0) is standard normal conditionally 
on a, and hence, as noted, unconditionally, to error of third order 0(n -3 / 2 ). Note that in a full exponential 
family model, 9 is already itself sufficient, and no ancillary statistic a is required. The expression for t>(0) 
given by © therefore simplifies somewhat: see, for example, Barndorff-Nielsen & Cox (1994, Example 
6.19). 

Barndorff-Nielsen & Cox (1994, Section 6.6.4), generalizing Pierce & Peters (1992), introduce quan¬ 
tities NP(0) and INF(0), both of order O p (n _1//2 ), such that R*( 0) = i?(0) + NP(0) + INF(0). 
Explicitly, we have 

NP(V0 = —57T\logC'(V>), 


where 


C(0) = 


RW 

{iMommY 12 




with L^(9) = 9, a) = d 2 L(9; 9, a)/<90<90 and, as before, denoting the (0, 0) component of 

the observed information j. Also, 

INF(0) = - 7 -^ 7 -log {u{i/j)/R(i/})}, 


where 


R(Y) 

«( 0 ) = j P W~ 1/2 -^y{ M W ~ Mty)}. 

O'lp 


Here j p is the profile observed information, j p (0) = —d 2 M(vj)/dip 2 , and the derivative with respect to 
0 is calculated with M( 0 ) — M( 0 ) considered as a function of 0 , 0 , 0 ( 0 ) and a. 
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Calculation of R*(ip) supposes explicit representation of the log-likelihood as a function of (8, a). 
Other formulations of the adjustment v(ip), due to Fraser and co-workers, are possible. The tangent 
exponential model introduced by Fraser (1990) avoids the need to specify the transformation Y —> (6, a), 
though still requires awkward analytic calculation: a useful summary is given by Brazzale et al. (2007, 
Chapter 8). In general, however, it is necessary to approximate to the quantity v(ip). Replacing v(ip) 
in the definition (OQ) of R*(w) by an estimate v('ip) typically yields an adjusted version of the signed 
root likelihood ratio statistic distributed as standard normal only to error of second order, 0(n~ 1 ). A 
computationally attractive approximation based on orthogonal parameterisation (Cox & Reid, 1987) is 
described by DiCiccio & Martin (1993). The approximation due to Skovgaard (1996) is theoretically 
attractive in that it also provides large deviations protection. 

To develop our analysis, some further notation is required. Let Lg(8 ) denote the score function, the 
vector with components L r {9 ) = dL{6)/d0 r ,r = 1 ,,d. In the calculations that follow, arrays and 
summation are denoted by using the standard conventions, for which the indices r,s,t,... are assumed 
to range over 1 ,,d. Summation over the range is implied for any index appearing in an expression 
both as a subscript and as a superscript. As above, differentiation is indicated by subscripts. Then 
E{L r {ff }} 0, let \ rs E{L rs (0) j-, X rs t E\^L rs t(6 ^)}, etc., and put l r L r {8 ), l rs L rs {8 ) X rs , 

l rst = L rst {8 ) — X rst , etc. The constants X rs , X rst ,..., are assumed to be of order 0(n). The variables l r , 
l rs , lrst, etc., each of which have expectation 0, are assumed to be of order O p (n 1//2 ). The joint cumulants 
of l r , l rs , etc. are assumed to be of order 0(n). These assumptions will usually be satisfied in situations 
involving independent observations, or structured dependence, such as in time series contexts. It is 
useful to extend the A-notation: let A rjS = E(L r L s ) = E(l r l s ), X rS}t = E(L rs L t ) = E(l rs l t ), etc. Bartlett 
identities involving the A’s can be derived by repeated differentiation of the identity f exp {L(8)}dy = 1; 
in particular, 

Xrs T Xr : s 0; X rs f T X rs j T A r £ ;S T X stt r T 9. 

Differentiation of the definition X rs = f L rs {8) exp {L(9)}dy yields X rs / t = X rst + X rSit , where X rs / t = 
dXrs/dO 1 . Further, let (A rs ) be the d x d matrix inverse of (A rs ), and let rj = —1/A 11 , r rs = r]X lr X ls , and 
i/ rs = \ rs + T rs . Thus, X rs , r rs , and u rs are of order 0(n _1 ), while 77 , which is what we have termed the 
adjusted information for ?/), is of order (){n). 

DiCiccio & Stern (1994a) showed that R(iJj) = ^^{Ri + R 2 + O p (n _3//2 )}, where R\ = —A lr / r and 
R 2 = X lr X St Ut + \X lr T St Ut - \X lr X SU U tV X rst lJv - \X lr T SU T tV X rst l u l v . 

Note that R,\ is of order O p (n~ 1//2 ) and R 2 is of order O p (n _1 ). Since E(Ri) = 0, it follows that 
(3) E{R{^)} = rj 1/2 {X lr X st X rS}t + \X lr T st X rS)t + \X lr X st X rst + iA^r^A^} + 0(n~ l ). 

3. Expectations of adjustments 

Detailed analysis given in the Appendix shows that we may approximate 77(INF( 7 /))} to 0(n~ l ) by 

9inf( 8) = g l ^X l ‘ T st (\X rS)t + |A rs t), 
and 77{NP('0)} to the same order by 

Unp (9) = - V 1/2 X lr v st (X rs , t + lX rst ). 
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These expansions permit a full statistical interpretation of the adjustment terms NP(-0) and INF(U), 
which we do through a series of remarks. 

Remark 1. We begin by examining E{R(ip)} when there are no nuisance parameters. If nuisance pa¬ 
rameters are absent, then A 11 = (An) -1 , r] = —An, r 11 = (—An) -1 , and a 11 = 0, and it follows 
that 

E{R(i/j)} = (—\n) 3 ^ 2 (|Aii i i + |Am) + 0(n 1 ). 

Remark 2. The quantities <?inf( 0) an d < 7 np($) are related to asymptotic quantities detailed by Efron (1987) 
in description of the ‘bias corrected accelerated’, BC a , method of construction of bootstrap confidence 
intervals, which is analysed in detail by DiCiccio & Efron (1996). Specifically, we have giNp{0) = ao 
and (?np( 6 i ) = z f) — a 0 , where a 0 = a o {0) and z 0 = z 0 (6 ) are respectively acceleration and bias-correction 
quantities. The quantity a 0 satisfies (DiCiccio & Efron, 1996) 

ao = —(skew((7) + skew(T)} + Ofn _1 ), 

6 

where U = {ip — ip)/a, with a 2 the variance of ip, given by a 2 = <J 2 (0) = A 1,1 + 0(n” 2 ), and T = 
pip — ip)/a, with a 2 = n 2 (f)). Further, z 0 is interpreted by 

$(z 0 ) = Pr( , 0 <ip) + Opn~ l ), 

where $ is the standard normal distribution function. 

DiCiccio & Efron (1996) note that the quantities a 0 and z 0 are invariant under reparameterisations of 
the model. Therefore, in using the asymptotic adjustment expectations <? IN f( 0) and g NP (0) to interpret 
nuisance parameter effects on the inference on ip, there is no restriction in assuming that the model under 
analysis is parameterised so that the interest parameter ip and the nuisance parameter cp are orthogonal 
(Cox & Reid, 1987). Therefore, now suppose there is a vector nuisance parameter <p present, but assume 
that the interest parameter ip and the nuisance parameter (p are orthogonal; then A 11 = (An) -1 , r/ = —An, 
A la = 0 (a = 2,..., d), T rs = 0 except when r = s = 1, in which case r 11 = (—An)” 1 , and 

£{INF(V0} = -(—Anr 3/2 (|An,i + |A m ) + O^” 1 ). 

Therefore, following Remark 1, to error of order 0(n _1 ), E {INF(p) } is seen to correspond to a mean 
adjustment for the signed root statistic R(ip) in the problem where the orthogonal nuisance parameter (p is 
known. Since the standard normal approximation to the distribution of R(ip) is typically rather accurate 
in scalar parameter cases without nuisance parameters, the mean adjustment should be quantitatively 
small quite generally, so we can anticipate that INF(?/;) is typically small. 

Remark 3. For general parameterisations, we have a 11 = //' 1 = u lb = 0 for a, b = 2,..., d, and thus, 

£{NP(V0} = — 77 1 / 2 A lr i/ a 6 (A ro , 6 + iA ra6 ) + 0(n- 1 ) 

= r] l,2 \ lr u ab {\\ rab - \ ra/b ) + 0 (n _1 ), 

where A ra/b 7A r(! / DO and A ra/b A ra,b T A rab • 

Under orthogonality, v ab = X ab for a, 6 = 2,..., cl, and the condition Ai a = 0 for a = 2,..., d 
implies that X la / b — 0 for b = 2 ,..., d, so that the identity Xi a / b = Xi a , b + Ai ab yields Ai ajf) = — Ai afe for 
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a, b = 2,..., d. Hence, nuisance parameter effects may be quantified from the expression 

E{ NP(0)} = —1(—An) -1/2 A a6 A afel + 0(n -1 ). 

Note that this gives [3\ = r] 1 / 2 E{N~P( , ip)} + 0(n -1 / 2 ) = — |A ab A abi + 0(n 1//2 ). Since the expansion for 
77{NP(0)} involves a multiple sum over the nuisance parameters, we see that NP(0) can be anticipated 
to be large when the number of nuisance parameters is large. 

Remark 4. Some further insight into NP(0) in the orthogonal case can be gleaned by noting that 


<9 log det [-L ab {6{^)}\ 
<90 


L ab (6)L abl (6) + O p {n~ 1 / 2 ) = X ab X abl + O^n" 1 / 2 ), 


which further relates 77{NP(0)} to the specific adjustment function of Cox & Reid (1987). Thus, in this 
orthogonal case, if log det{— L ab {6)} does not change rapidly with 0, such as when L{6) = //(0) + h(4>), 
in which case det{— L ab (9)} is constant with respect to 0, then A ab A a bi is small in magnitude, and hence, 
we would expect NP(0) to be small in magnitude; see also the discussion in Cox & Reid (1987). 
Remark 5. There is one further interpretation of NP(0) that is worth noting. DiCiccio & Stem (1994a) 
showed that the difference between 0 and 0 is 

0 - 0 = -A 11 /?! + O p (n~ 3 / 2 ) = + O p (n~ 3 ' 2 ) = rT 1/2 £{7VP(0)} + O p (n“ 3 / 2 ), 


and hence, this difference, when in expressed in terms of standard deviations of 0, is 


t-Jt.=E{NPm + O r (n~'). 

Remark 6. Note that the quantities //np(9) and .Qinf (C) are both of order ()(rr ^ 2 ). As we shall illustrate, 
calculation of the individual values provides important statistical insight. We propose further that a simple 
measure of the relative influence within the assumed model of the nuisance parameter on inference on 
the interest parameter 0, independent of the sample size n, might be obtained by considering their ratio 

5 , np(^)/s , inf(^)- 

Remark 7. In general, the quantities .gv-pfP) and //inf (9) depend on the unknown parameter 6. In practice, 
following the bootstrap principle, they may be estimated by g^p(0) and //inf( 9) respectively. An adjusted 
version of the signed root statistic R(ip), easily calculated in practice, once //np( 6 i ) and //inf(^) have been 
calculated, is given by i? a (0) = R(ip) + g ^ p ( 9 ) + /?inf(^)- Since /?np($) — 9np{9) = O p {n~ l ), we have 
that R a (0) = i?*(0) + () p (rr l ), and therefore that R a (u) has the standard normal distribution to error 
of order 0(n~ l ). DiCiccio & Efron (1996) previously remarked that i?(0) + Zq{6) is standard normal to 
error of order 0 (n -1 ), but did not investigate practical use of this statistic for inference: an alternative 
is the statistic 70(0) = 77(0) + Zq{9). Although no claim of desirable large deviation properties of the 
ki nd enjoyed by the method of Skovgaard (1986) can be made for this statistic, empirical evidence, not 
reported here, suggests that it nevertheless yields highly accurate inference in many settings. 

Remark 8. Note that the asymptotic regime adopted here is one in which the dimensionality <7 — 1 
of the nuisance parameter 0 remains fixed as the sample size n increases. However, we propose that 
examination of the quantities //np( 9) and //inf( 9) and their ratio is a useful device to quantify the effect 
of an increasing dimension of nuisance parameter on the inference, as we shall illustrate in the next 
Section. For stratified models, such as those in Examples 2, 4, 5 and 6 below, Sartori (2003) noted 
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that, when both the sample size n within each stratum and the number of nuisance parameters q tend to 
infinity, NP(ip) = O p (qm ~ 1 / 2 ), while INF(ip) = O p (m~ 1 ^ 2 ), where m = nq is the total sample size, 
irrespective of the nature of the sequence {q, n}. Hence, the ratio N P(ip) / IN F(ip) = O p (q) in such an 
asymptotic regime, consistent with calculations given in Examples 2, 4, 5 and 6 below. Relative to the 
inference adjustment, the nuisance parameter adjustment increases at a rate proportional to the dimension 
of the nuisance parameter. 


4. Examples 

We consider here a number of theoretical and numerical examples. 

Example 1. Normal linear regression. Let Y\..... Y„ denote independent random variables of the 
form Yi = xf /3 + af t , where x ,\..... x n are known covariate vectors of length q, a is an unknown scalar 
interest parameter and [3 is an unknown nuisance parameter vector of length q, so that 6 = (a. (3). The e t 
are assumed to be independent standard normal random variables. 

In this case, n 1 / 2 g mF (9) = 2 1 / 2 /3 and n 1 /, 2 c/ N p($) = q/2 1 / 2 . Note that these quantities do not depend 
on the parameter value 9, while // = 2 n/a 2 . Nuisance parameter effects are determined, to second order, 
only by the dimensionality of the nuisance parameter (3, not its value. This observation in turn would 
suggest that inference based on the bootstrap distribution of R(a) should be highly accurate. In fact, 

R(cr) is a simple function of a 2 /a 2 , which has a distribution free of 9: (n — q)d 2 /o 2 is distributed as 
chi-squared on n — q degrees of freedom. A bootstrap calculation will, modulo simulation variability, 
reproduce the exact sampling distribution of R(o). 

Example 2. Neyman-Scott model. Let Y tJ , for i = 1 ,... ,n and j = 1..... r/ be independent Gaussian 
random variables, with Y t j being distributed as N(pj, a 2 ). The interest parameter is o, with nuisance 
parameter (pi,..., p q ), so that 6 = (a, ..., n q ). 

Now we calculate n 1 / 2 ( 7 i NF ( 6 l ) = l/{1.5(2g) 1//2 }, with n, 1 // 2 g NP (0) = (g/2) 1//2 , so that 5 r NP ( 6 >)/t 7 INF ( 6 >) = l-5q. 
Again, these quantities do not depend on the value of 6 , only the dimension q of the nuisance parameter. 

The adjusted information is given by 77 = 2nq/o 2 . As in Example 1, the signed root statistic f?(er) has 
a distribution free of the parameter value: it is a function of the pivotal quantity a 2 jo 2 , and its exact 
sampling distribution can be constructed by bootstrapping. 

A related problem concerns a generalisation of the Behrens-Fisher problem, in which we observe Y rj , 
for % — 1,..., n and j = 1..... r/ to be independent Gaussian random variables, with Y l3 being distributed 
as N(n, a 2 ). The interest parameter is the common mean fi, with (a 2 ,..., a 2 ) as nuisance. In this case, 
we see that f?{INF( , 0)} and f?{NP(-0)} are both 0(n~ 1 ), not 0(n -1 / 2 ). Nuisance parameter effects are 
quantitatively slight though, by contrast with what is noted above, in this case the signed root statistic 
R(fi) is not exactly pivotal, and the bootstrap inference is not exact. Limited numerical results given by 
Young (2009) for the case q — 2 would indicate, however, that the bootstrap inference is highly accurate 
even for small sample size n. 

Example 3. Exponential regression. Suppose Y\..... Y n are independent exponential random vari¬ 
ables, with means depending on given covariate values. We suppose for simplicity the case of two 
covariates, though our conclusions extend immediately to the case with a general number of covariates. 

So, we suppose Yi is exponentially distributed with mean fa exp (—^Zi — fawi), with ^ z; = ^ Wi — 0, 
and ip the interest parameter. Routine calculations show that (?inf(^) and g^p(9), though complicated 
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functions of the covariate values, are again free of the parameter 9 = (ip, (pi, © 2 )- Further, the signed root 
statistic R(ip) is again easily seen to be exactly pivotal, and bootstrap inference is once more exact. 

In the simple case of a single covariate, with E(Yi ) = (pexp(-ipZi), with Zi = 0, we have 

£{NP(V0} = 0 + 0(n-‘), £{INF(V0} = -(££*?) + °( n ^ : 

the nuisance parameter adjustment has expectation of smaller order of magnitude than that of the infor¬ 
mation adjustment. 

We consider now from a numerical perspective three examples with many nuisance parameters previ¬ 
ously discussed by Sartori et al. (1999). In each, we provide illustration of dependence of the measure 
<7np(0)/<7inf(0) on the dimensionality of the nuisance parameter. 

Example 4. Inverse Gaussian model. Let Yj, for i = 1 ,,n and j = 1.... .q be independent, 
inverse Gaussian random variables, with Y tJ having probability density 

f{y\^Aj) = {^/(^)} 1/2 y~ 3/2 expj-K #- 1 + <f>jy) + (V^) 1/2 }> y > o, 

where ip > 0 and (pj > 0, so that 9 = (ip, (pi,..., <p q ) and the overall sample size is m — nq. 

Simple algebraic manipulations show that, independently of the parameter value 9, n 1 ^ 2 gmF(9) — 
—l/{1.5(2g) 1 / 2 }, and n^ 2 g^p(9) = —(qj 2) 1 / 2 , so that 

<7np(0)/<7inf($) =1.5 q in this model. We note that in this model the adjusted information for ip is given 

by r] = nq/(2ip 2 ). 

Example 5. Multi-sample exponential model. Let Y iq , for i = 1 ,... ,n and j = 1^ be indepen¬ 
dent, exponential random variables, with Y tJ having mean l/(pj. The parameter of interest is 

9 

ip = g' 1 ^exp(-0 i f o ), 

3 = 1 

where t 0 > 0 is a fixed constant and 9 = (ip, 0), with the nuisance parameter (p = (0 2 , • • •, (p q ). As noted 
by Sartori et al. (1999), qip may be interpreted as the expected number of items failing by t 0 in a parallel 
system with failures rates <pi,...,<p q . 

The interest parameter ip is therefore a nonlinear function of the canonical parameter in a full ex¬ 
ponential family model. Again, construction of the information and nuisance parameter adjustments 
INF(-0) and NP(0) is straightforward, though the constrained maximum likelihood estimator 9 must be 
calculated numerically. 

By contrast with previous examples, in this model the ratio g^p(9)/gmp(9) depends on the value of 
the parameter 9. Values illustrating the effect of increasing nuisance parameter dimension are given in 
Table 1 for two cases. In both t 0 = 0.5: case (a) considers (pi = 1 ,i = 1,... ,q, so that ip =0.6065; 
case (b) fixes ip =0.0333 for each dimension of nuisance parameter, sets exp(— (p q t 0 ) = qip/ 2 and fixes 
(pi = ... = (p q -i, the common value being determined by the specified ip. Acute dependence of the 
ratio on the actual parameter values, rather than just the nuisance parameter dimension as in previous 
examples, is apparent. 

Example 6. Curved exponential family model. Our final example concerns a model for which calcula¬ 
tion of R*(ip ) is intractable: the sample space derivatives, derivatives of the log-likelihood with respect 
to the maximum likelihood estimator, required by the construction © of R*(ip ), must be approximated. 
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Table 1. Dependence of ratio g-^p(O)/g WF (d) on q, multi-sample exponential model. 

Case (a) has fa = 1, i — 1,..., q, case (b) has fa — ... — fa-i, with exp(— fat 0 ) = qip/2. 

q 2 5 10 20 50 

(a) 2.25 9.00 20.25 42.75 110.25 

(b) -2.10 -5.50 -8.56 -15.76 -130.29 

Table 2. Dependence of ratio g NP (0) /< 7 inf($) on q, multi-sample curved exponential 
family model. Case (a) has 0 = 1, Hi = i, i — 1,..., q, case (b) has -0 = 1, Hi = 1, i = 

q 1 2 5 10 20 50 

(a) 1.11 2.45 6.77 14.17 29.09 74.01 

(b) 1.11 2.21 5.53 11.05 22.11 55.26 

By contrast, the calculations required to evaluate <7 inf( 0) an d //np(^) are no more complex than in the 
other examples. 

Let Yij, for i — 1,..., n and j = 1..... <7 be independent normal random variables with means /x ; > 0 
and variances 0/x ■ . This model constitutes a curved exponential family. The parameter of interest is 0, 
with Hi ,..., n q as nuisance parameters, 6 = (0, /xi, ..., g q ). 

Again, the ratio g^p{0) / g mF {0) depends on the value of the parameter 6. Illustrative values are given 
in Table 2, for two cases: case (a) has 0 = 1, Hi = 0 i = 1, • • •, q, while case (b) has 0 = 1, Hi = 1, i = 
l,...,q. 

5. Decomposition of the Bartlett correction factor 

Recall that the sum of (?inf( 6 i ) and g N p(0) is, to 0(n -1 ), equal to 

E{-R(fa} = -r ] 1 ' 2 {\ lr \ 8t \ rs ,t + \\ lr T st \ rS)t + ^X lr X st X rst + ^X lr r st X rst ) 

= ~V 1/2 X lr X st (X rStt + |A rst ) - v 1/2 X lr T st (±X rS!t + |A rst ). 

To decide how we might choose . 9 inf(^) and g^p(d) in a decomposition of this sum, consider imposing 
two conditions: first, gj^viO) must take the same value whether we have no nuisance parameters or 
we have orthogonal nuisance parameters; and second, g^pi/X) must be 0 when we have no nuisance 
parameters. These conditions suggest that T rs and u rs play a key role. Note that r 11 = (—An ) -1 
when there are no nuisance parameters, while for orthogonal nuisance parameters T rs = 0 except when 
r = s = 1, in which case r 11 = (—An) -1 . Thus, T rs is the same in the orthogonal nuisance parameter 
case as it is when nuisance parameters are absent. On the other hand, since u rs = 0 whenever either or 
both of r and s are 1, we have that u ] 1 = 0 when there are no nuisance parameters. It is readily seen that 
the decomposition of the sum into g^pfa) and g^p(d) according to the two conditions can be achieved if 
we substitute X st = v st — T st in the sum and then take 5 'inf(^ 1 ) to consist of those terms involving r st and 
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take <7 np( 0) to consist of those terms involving v st . We demonstrate here that the same reasoning may be 
applied to obtain a decomposition of the Bartlett correction factor for the likelihood ratio statistic W (0). 

Lawley (1956) showed (see also DiCiccio & Stern, 1994a) that the expectation of kk(0) is i?{kk(0)} = 
1 + b(9 ) + 0(n~ 3//2 ), where 

b(0) = (X rs \ tu - V rS V tU )(±X rstu - X rst/U + X r t/su) 

- (X rS X tU X VW - U rS U tU U VW )(lX rst Xu VW - KsAuv/w + Ks/tKv/w) 

_ ( X ru X sw X tv _ u rU U SW U tv )(lX rst X uvw - X rst X uv/w + X rs/t X uv/w ). 


We now decompose b{6) into the sum b(9) = bm P (9) + & NP (0), where 6inf($) is the same whether we 
have no nuisance parameters or whether we have orthogonal nuisance parameters, and b^ P (9) is 0 when 
there are no nuisance parameters. We make the substitution X rs = v rs — T rs in b(9): &inf($) consists of 
those terms involving the r rs but not the v rs \ &np($) consists of those terms that involve the u rs in any 
way. 

Succinct expressions for b WP (9) and 6 NP (6>) derived this way are 


^INf(^) T T (^A rstu Aj -st/u "f" A rt/su) 

■ rs tu vw /1 \ \ _ \ \ I \ \ \ 

\ T T T \ ^A rs iA uvw ^rst^uv/w "T" ^rs/t^uv/w) 

I ru sw tv (1\ \ _ \ \ _i_ \ \ \ 

\ T T T \^A rs iA uvw ^rst^uv/w i A rs jiA uv jw) •) 


and 


&np(0) = (X rs X tu - T rs T tu - v rs v tu ){\x rstu - X rst/U + X rt/su ) 

(\rs\tu\vw | _rstuvw , .rs, .tu, .vw\/1 \ \ \ \ , \ \ \ 

\A A A ~r T T T V V V )\ ^^rst^uvw ^rst^uv/w i ^rs/t^uv/w) 

( \ru\sw\tv | _ruswtv _ ru n t sw n ,tv\ (1 \ \ \ \ , \ \ \ 

yA A A i T T T V V V )\ Q^rst^uvw ^rst^uv/w i '^rs/t'^uv/w) • 

If there are no nuisance parameters or there are orthogonal nuisance parameters, then 
&inf($) = (An) 2 (|Aim — Ain/i + An/ii) 

— (An) 3 (tAiiiAih — AniAii/i + An/iAn/i) 

— (An) 3 (|AiiiAih — AniAii/i + An/iAn/i). 

Note that if there are no nuisance parameters, u 1 1 = 0 and r 11 = —A 11 , so that b NP (f)) is identically 
zero. It is useful to evaluate b PP (9) in the case of orthogonal nuisance parameters to show better the 
effect of nuisance parameters. Now, by making the substitution X rs = v rs — T rs , we have 


b NP (9) = {(l/ s - T rs ){v tu - T tU ) ~ T rS T tU - V rS V tU }{\X rstu - X rst/U + X rt/su ) 

_ {(j/ 8 — T rS )(u tU — r tu ){y vw — T VW ) + T rs T tu T vw — u rS U tU U VW } 

X ( ^^rst^uvw ^rst^uv/w ^rs/t^uv/w') 

_ {(i/ ru — r ru )(v sw — r sw )(y tv — r tv ) + T ru T sw T tv — v ru v sw v tv } 

^ ( 'Q^rst^uvw ^rst^uv/w ^rs/t^uv/w') 
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= -(r r V“ + V rS T tU )(\\ rstu - Kst/u + Kt/su ) 


(-.rs—tu- ,vw i ^rs,,tuvw . , rs tuvw rs,,tu,,vw , .rs^tu, ,vw , ,rs _ ,tu^vw\ 
— (T T V + T V T + V T T — T V V — V T V — V V T 


X ( ^^rst^uvw ^rst^uv/w A rs/t^uv/w ) 


(^ru^sw^.tv | _-ru. ,sw—tv \ , ,ru ^sw ^tv r'u^.sw^.tv . ,ru—sw, ,tv ^ ,ru ^ .sw ^.tv \ 

— (T T + T V T V T T — TV V — V T V — V V T 


^rs/t^uv/w) • 


X ( Q^rst^uvw ^rst^uv/w ^rs/t^uv/w) 

We consider each of the terms in fr NP (0) separately under orthogonality: 

- ( T rS V tU + V rS T tU )(\ X rstu ~ X rst / U + Kt/su) 

= (All) 1 A a6 (|Aiia6 — Alab/i — Aiia/fi); 


/ r s —tu vw | rs , tuvw . , rs tuvw rs^.tu^.vw _ rstu^ ,vw ni rs^,tuvw\ 

[T T V + T V T + V T T — T V V — V T V — V V T ) 

X (^^rst^uvw ^rst^uv/w H“ A rs/t^uv/w ) 

— — (All) 2 A a6 (|AmAi ab + jAiiaAiift — AiaftAn/i + All /1 A a 6/i) 

— (All) A A (^AllaAfrcd + ^Ai a 5Ai C( ^ Ai la Abc/d ^lab^lc/d ^H/a^bc/d) j 


(-ru—SW- ,tv i ^~ru n .sw^tv . ^.ru—SW^tv -ru-.sw-.tv - ,ru—sw, ,tv , .ru, .sw ^.tv \ 
(T T V T V T + V T T —TV V — V T V — V V T ) 


X ( Q^rst^uvw ^rst^uv/w H“ ^rs/t^uv/w) 

— — (An) 2 A a& (^An a Aii b — An a An/b — An a Aib/i) 

— (An) 1 A afl A cd (AAi ac Aibd — AiocAw/i). 

The resulting formula for fe NP (0) in the presence of orthogonal nuisance parameters is 

^Np(^) = (All) 1 A a6 (|Aiiob — Alab/I — All a/b) 

— (An) "A afc (AniAi a b + |Aii a An b 

— AllaAn/6 — AiiaAife /1 — AiafiAn /1 + An/iA a &/i) 

— (An) 1 A a6 A cd (|An a A6 C d + |Ai a bAi CC ( + |Ai ac AiM 

An a Abc/d Ai a bAi c /(i AiacAfed/i T An/ a A bc/d- 


Just as for g^p(9) in the case of orthogonal nuisance parameters, we see that & NP ($) involves multiple 
sums over the indices for the nuisance parameters, so &np (9) can be expected to be large when the number 
of nuisance parameters is large. 

An interesting feature emerges from comparing the formulas for gsp(9) and &np( 0) in the orthogonal 
nuisance parameter case. While the expression for g NP (0) involves a double sum over the indices for the 
nuisance parameters, the expression for b-^p (0) involves both double and quadruple sums. Consequently, 
we might reasonably expect the ratio &np(0)/&inf( 0) to grow more rapidly with the number of nuisance 
parameters than does the ratio <7np(0)/<7inf(0)- This phenomenon is apparent in Example 1, for which 
9np( 9)/g mF (0) = 3g/2. It turns out that fr INF ($) = n~ l A and & NP ($) = n~ 1 (q 2 +q), so &np(#)/&inf(#) = 
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3(7/ 2 + q). In this example, the ratio &np(0)/&inf( 0) grows quadratically with the number of nuisance 
parameters, while the ratio g^p(9)/gmp(9) only grows linearly. 

6. Decompositions for other pivots 

So far, our focus has been on inference based on an adjusted version of the signed root likelihood ratio 
statistic; however, other pivots that are asymptotically standard normal also find widespread use, notably 
the Wald-type pivots based on the difference ip — ip and the score-type pivots based on the derivative 
M±(ip) = dM(ip)/dip = Li{9(ip)}. DiCiccio et al. (2015) provide analysis of circumstances where in¬ 
ference, such as p— values, obtained by bootstrapping various first-order asymptotically equivalent pivots 
will agree to higher-order with that obtained from the signed root statistic. It is of interest to assess the 
impact that nuisance parameters have on higher-order adjustments obtained by Cornish-Fisher transfor¬ 
mation to these other pivots. We examine the structure of these adjustments in terms of the quantities 
9inf{0) and (Jkp(O), to allow explicit comparisons with inference based on R(ip). 

Let T(ip) denote an asymptotically standard normal pivot, and let its cumulants be denoted by k 2 , 
etc. Typically, the mean K\ and skewness k, 3 are of order 0(n~ l/2 ), while the variance n 2 = 1 + 0(n~ 1 ); 
the fourth and higher-order cumulants are of order 0(n ~ l ) or smaller. Central to higher-order inference 
based on T(ip) is the Cornish-Fisher transformation T — T 2 — Ki + Ak 3 , which has the standard 
normal distribution to error of order 0(n~ l ). The Cornish-Fisher transformation of R(ip) agrees with the 
R*(ip) statistic to error of order O(n~ 1 ). The adjustment terms |k 3 and —K\ + | u 3 that appear in the 
Cornish-Fisher transformation depend on 9, so they would need to be estimated to achieve higher-order 
inference in practice. An interpretation of the adjustment made by the Cornish-Fisher transformation 
is that whether or not a mean adjustment suffices to make the desired correction hinges on the order of 
k 3 . This is an important factor differentiating the signed root statistic from other asymptotically standard 
normal pivots. 

We report K\ and k 3 for some common choices of T(ip). For T(ip) = R(ip), we have seen that 
Ki = — gmp(9) — gjsip{9) + 0(n -1 ); in this case, = 0(n -1 ). Consequently, higher-order inference 
based on R(ip) requires estimation of K\ only, and estimation of k 3 is not necessary. 

To report and k 3 for other pivots T(ip), it is convenient to introduce one further asymptotic quantity 
in addition to gwF{9) and g NP (#). This quantity is d = d{9) = —r] l R^X lr T st X rst , which arises quite 
naturally from the profile log-likelihood function. It turns out that the third derivative of the profile log- 
likelihood function evaluated at ip is M 3 (ip) = rf^Qd + () p (n 1 ^ 2 ). The quantity d is also related to 
Efron’s (1987) asymptotic adjustments a 0 and c q , which were discussed by DiCiccio & Efron (1996): 
d = 2a 0 + c q . Furthermore, in terms of (?inf( 6 i ), gNp{0), and d, the mean of ip is E{ip) = ip — (2gi NF (9) + 
9np( 9) - d)7]~ 1/2 + 0(n -3 / 2 ). 

A key property of the quantity d is that it is the same whether there are no nuisance parameters or there 
are orthogonal nuisance parameters. In both cases, the formula for d becomes d — —(—An) 3//2 |A m . 
Thus, d is similar to (?inf(^) : we would not expect d to grow with the number of nuisance parameters. 
The quantity d does differ from gmp(9) and g^p{9) in one important respect: while g\^p(9) and gppiO) 
are invariant under reparameterizations 9 = (ip, <p ) —)■ {g(ip), h(ip, (p)}, where (p = (9 2 ,..., 9 d ) contains 
the nuisance parameters and g(ip ) is a monotonically increasing function, d does not enjoy the property 
of invariance. 
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We next consider the Wald statistic with observed information, T(ip) — (R — ip)/(— L 11 ) 1 / 2 , and the 
Wald statistic with expected information, T( , 0) = (^ —^)/(—A 11 ) 1 / 2 = (ip — ip)^ 1 ^ 2 . The distributions of 
these pivots are the same to error of order 0(n~ l ). For both Wald statistics, K\ = — {//inf(#) + <?np(0) + 
d} + 0(n _1 ) and k 3 = —6 d + 0(n _1 ). Consequently, the Wald statistics are similar to the signed root of 
the likelihood ratio statistic in that nuisance parameters affect the higher-order adjustment terms through 
g-Np(0), which is involved in 

Finally, we consider the score statistic with observed information, T(ip) = (?/>)(—L 11 ) 1 - 72 , and the 

score statistic with expected information, T(ip) = M 1 (i/p)(— A 11 ) 1 / 2 = M 1 (^)i 7 -1 / 2 . Just as for the Wald 
statistics discussed above, the distributions of these pivots agree to error of order 0(n -1 ); for these score 
statistics, aci = — {g W p(9)+g N p(9)—2d} + 0(n~ 1 ) and ac 3 = 12d+0(n _1 ). Again, nuisance parameters 
influence the higher-order adjustment terms through g^ip(9), which is a component of K\. 

An important property of the profile log-likelihood function M (pip) is that the expectation of the profile 
score is E{Mi(ip)} = — rj 1 ^ 2 g^p(9) -t-C^nT 1 ). Thus, E{Mi(ip)} is of order 0(1); the expectation of the 
profile score does even vanish asymptotically. Adjusted profile likelihood is discussed in the Appendix. 
Most of the adjustment functions B(ip) that have been proposed to construct an adjusted profile log- 
likelihood M(^) = M(^) + B{^>) have the property that E{B 1 {^)} = g 1 ^ 2 g^p(9) + 0[n~ l ), so the 
expectation of the adjusted profile score is E{Mi('ip)} = 0(n~ l ), which does vanish asymptotically. 

For T(ip) = -R(VO = sgn(^ — ip)[2{M (^) — M (ip)}] 1 ^ 2 , as detailed in the Appendix, we have = 
—gwF(d) + 0(n -1 ) and k 3 = 0(n -1 ). Thus, at order 0(n -1 / 2 ), the difference between the distribution 
of R(vj) and the standard normal distribution depends on (?inf($)> a term which is the same whether 
there are no nuisance parameters present or there are orthogonal nuisance parameters. Consequently, we 
expect the difference between the distribution of R('<p) and the standard normal distribution not to grow 
inordinately as the number of nuisance parameters increase. 

Similar comments apply to Wald statistics and score statistics based on the adjusted profile log- 
likelihood function. For example, for T(^) — (ip — ip){— M 11 ('0)} 1 ' /2 , we have fiq = — {(?inf( 6 i ) + 
d} + 0(n -1 ) and k 3 = —6 d + 0( n -1 ), while for T(ip) = Mi(ip){— Mn(^)}' 1 ^ 2 , we have Ki = 
—{gmF(d) ~ 2 d} + (^(n" 1 ) and k 3 = 12 d + 0(n -1 ). 

Implementation of higher-order inference to error of order 0(n~ l ) requires that we estimate the adjust¬ 
ment terms |k 3 and —K\ + |k 3 ; we might, for example, use plug-in estimates or derive estimates from a 
simulation procedure such as the parametric bootstrap. If these adjustment terms change rapidly with the 
value of the parameter 6, then there is greater scope for error in the estimation process than if possible 
if the adjustment terms are stable across 9 values. This observation points to the use of asymptotically 
standard normal pivots T{E) that are derived from the adjusted profile log-likelihood function, since the 
adjustment terms for such pivots depend only on //inf( 60 and d. If the adjustment terms are small in 
magnitude, then they are unlikely to vary unduly with 9, and the adjustments can be estimated more 
reliably. Situations can arise, as is the case in the normal regression example, that the quantity g^p(9) is 
large yet it remains constant with respect to 9. In these circumstances, the need to use the adjusted pro¬ 
file log-likelihood is not so pressing; indeed, for the normal regression model, the parametric bootstrap 
affords exact inferences, except for simulation error. Since such situations are not commonplace, there 
is strong motivation for using generally procedures that ensure the magnitudes of the adjustment terms 
are controlled. However, it could be useful to develop conditions that easily identify models, such as the 
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normal linear regression model, for which the adjustment terms, especially gwp(9), are constant or nearly 
so, since, in such models, the benefit of using adjusted profile likelihood for accurate inference is not so 
pronounced and procedures based on the regular profile likelihood are likely to suffice. 

7. Discussion 

Accurate inference on a scalar interest parameter ip in the presence of a nuisance parameter may be 
obtained using the signed root likelihood ratio statistic R(ip). A computationally intensive, but analyti¬ 
cally simple, approach bases the inference on a bootstrap estimate of the sampling distribution of R(ip), 
constructed by fixing the nuisance parameter at its observed constrained maximum likelihood value. Al¬ 
ternatively, inference can be based on a standard normal approximation to the sampling distribution of 
an analytically adjusted version of R(ip). For this latter approach, the gold standard is represented by 
Barndorff-Nielsen’s R* statistic. The adjustment made by this statistic may be decomposed into a sum 
of two terms. These adjustments INF(^) and NP(u) are determined to second order, O p (n~ l ), by their 
expectations. 

We have provided an explicit evaluation of these expectations, allowing new theoretical interpretation 
of the relative importance of the two adjustments and to the intrinsic difficulty of the inference problem 
within any specified model. 

In particular, quantifying the dependence of the expectations on the nuisance parameter provides in¬ 
sight to circumstances where the bootstrap and analytic approaches might be expected to perform well 
in terms of accuracy, even in high dimensional problems and with small sample sizes. We have demon¬ 
strated that within a particular model, the importance of the nuisance parameter adjustment may depend 
not only on the structure of the model, as expressed by the nuisance parameter dimension, but the param¬ 
eter values themselves. In key problems, dependence lies only on the parameter dimension. Calculation 
of the approximations gmp(9) and <7 NP (0) of 77{INF('0)} and E{NP(ip)} involves only evaluation of 
expectations of low order log-likelihood derivatives, and has been demonstrated to give useful theoretical 
insight to the degree of the adjustment to the signed root statistic R(ip) given by the statistic R*(ip) for 
any specified inference problem, and therefore to the likely value in use of R*(ip) or bootstrapping as a 
means of improving accuracy. 

We note that empirical estimation of the means, through the bootstrap principle of estimation of the nui¬ 
sance parameter, furnishes a simple procedure for adjustment of the signed root likelihood ratio statistic. 
A thorough analysis of this empirical adjustment method for the purposes of inference with higher-order 
accuracy, as well as a comparison of such an empirical adjustment method with alternative approxima¬ 
tions, is beyond the scope of this paper. 


Appendix 

Adjusted profile likelihood. There have been many suggestions to replace the usual profile likelihood 
function M(ip) by an adjusted version M(ip) = M(ip) + B(ip), where B(ip) is an adjustment function 
whose derivatives with respect to ip are of order O p ( 1). The likelihood ratio statistic based on the adjusted 
profile likelihood is W(ip) = 2 {M ( ip ) — M(ip)}, where ip is the point at which M(ip ) is maximized. The 
signed root of the likelihood ratio statistic based on the adjusted profile likelihood is R(ip) = sgn (ip — 
iP){W(iP)} 1 / 2 . 
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Following our previous notation, we write Bi(ip) = dB{ip)/dip, Bu(^) = d 2 B (ip) / dip 2 , etc. Let 
/3i = E{B X { -0)}, /3 U = E(B n), etc.; these quantities are assumed to be of order 0(1). Further, let 
bi = Bi(ip) — (3i, bn = Bn(ip) — /3n, etc., with these quantities assumed to be of order O p (n -1 / 2 ). 
Assume also that the joint cumulants of nb\, nbn, l r > Ks, etc - are of order 0(n). 

In many instances, the adjustment function B(ip) has been proposed to take into account the effect 
of nuisance parameters for inference about ip; see, notably, Cox & Reid (1987), Barndorff-Nielsen 
(1983), Skovgaard (1996), Severini (1998), DiCiccio & Martin (1993), Barndorff-Nielsen & Cham¬ 
berlin (1994). These adjustment functions have the effect of reducing the expectation of the profile 
score from order 0(1) to order 0(n ~ 1 ). Specifically, these functions have d\ = [> + 0( n -1 ), where 
p = — p\ lr v st (\\ rst -\- Ks,t)- Since, in general, E{Mi(ip)} = —p + 0(n” 1 ), it follows that E{M 1 (ip)} = 
0(n -1 ): see McCullagh & Tibshirani (1990), DiCiccio et al. (1996). 

For a general adjustment function B(ip ), DiCiccio & Stern (1994b) showed that R(ip) = r] 1 ^ 2 {Ri + 
R 2 + O p (n _3//2 )}, where Ri = Ri = —A lr l r and R 2 = R 2 — A 11 /?!; in particular, R(ip) = R(ip) + 
V~ 1/2 /3 1 + O p (n~ l ). Below, we use this result with a particular adjustment function to obtain a repre¬ 
sentation of the nuisance parameter adjustment NP (ip), from which E{NP(ip)} is then determined to 
0(rT l ). Combined with ©, this enables calculation to 0(n _1 ) of i?{INF(^)}. 

Expectations of Adjustments. We have, 

E{NP(iP)} + E {INF (ip)} = —E{R(ip)} + 0(n _1 ) 

(4) = — ? 7 1//2 {A lr A s *A r . S) t + lA lr T st A rS!t + lA lr A st A rst + |A lr T st A rst } + 0(n *). 

It is easily seen that NP(-0) and INF(-0) are of the form NP(^) = 77{NP(^)} + O p (n _1 ) and INF(^>) = 
77{INF(^)} + O p (n _1 ). Here we develop explicit approximations for 77{NP(-0)} and i?{INF(^)}. 

The quantity NP(^) is related to the modified profile likelihood of Barndorff-Nielsen (1983), an ad¬ 
justed profile likelihood which reduces the bias of the profile score. Following Sartori et al. (1999) and 
Pierce & Bellio (2006), we have that, up to an additive constant, the log modified profile likelihood is 

L MP (ip) = —R(ip)NP(ip) - {RmV 2 

= -R(ip)NP(ip) - M(ip) + M(ip) 

= -l{R(iP) + NP(iP)} 2 + O p (n- 1 ). 

The modified profile likelihood therefore corresponds to an adjustment function of the form B(ip) = 
—R(ip)NP(ip). Further, the signed square root of the modified profile likelihood ratio statistic is equiva¬ 
lent, to O p (n -1 ), to R(ip) + NP (ip), as noted by Sartori et al. (1999). The general result of DiCiccio & 
Stern (1994b) then gives NP (ip) = rj~ x ^ 2 p5\ + O p (n _1 ). 

Observing that R(ip) — (ip — ip)i ) 1//2 + O p (n~ 1 / 2 ) and NP(ip) = NP(-0) + O p (n '), we have 

L MP {ip) = -R(ip)NP(ip) - M(ip) + M(ip) 

= (ip - ip)fj 1/2 NP(iP) - M{i i) + M(ip) + O p (n~ l ), 
and differentiation with respect to ip yields 

Lf p (iP) = f, 1 ' 2 NP(^) + M^iP) + O p (n~ l/2 ) = ri^NPiiP) + M x (iP) + O p (n~ l l 2 ). 


QUANTIFYING NUISANCE PARAMETER EFFECTS VIA DECOMPOSITIONS OF ASYMPTOTIC REFINEMENTS FOR LIKELIHOOD-BA! 


Since (see, for example, DiCiccio et al., 1996) E{L^ p (pp)} = 0(n x ) and E{Mi(i(j)} = — p + 
0(n -1//2 ), it follows that 

E{NP(^)} = T) ~ 1/2 p + 0{n~ l ) = -r] 1/2 \ lr is st (\ rs j + ±\ rst ) + 0(n _1 ), 

so that /?i = rj 1 / 2 E{ NP(^)} + 0(n~ x l 2 ) = —r]\ lr v st (\ rSit + rst ) + 0(n~ 1//2 ). It follows from © that 

E{ INF(^)} = p 1/2 \ lr r st {\\ rSjt + ±Kst) + Oin- 1 ). 

We observe also that this analysis confirms i7{NP(^)} = r/ -1 / 2 /?! + 0(n _1 ) = NP(-0) + O p (n _1 ), as 
noted earlier. 
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